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Abstract 

Computational protein design aims at constructing novel or improved func- 
tions on the structure of a given protein backbone and has important applica- 
tions in the pharmaceutical and biotechnical industry. The underlying combi- 
natorial side-chain placement problem consists of choosing a side-chain place- 
ment for each residue position such that the resulting overall energy is mini- 
mum. The choice of the side-chain then also determines the amino acid for 
this position. Many algorithms for this J/ 9 -hard problem have been proposed 
in the context of homology modeling, which, however, reach their limits when 
faced with large protein design instances. 

In this paper, we propose a new exact method for the side-chain placement 
problem that works well even for large instance sizes as they appear in pro- 
tein design. Our main contribution is a dedicated branch-and-bound algorithm 
that combines tight upper and lower bounds resulting from a novel Lagrangian 
relaxation approach for side-chain placement. Our experimental results show 
that our method outperforms alternative state-of-the-art exact approaches and 
makes it possible to optimally solve large protein design instances routinely. 

1 Introduction 

Protein design aims at constructing novel or improved functions on the structure 
of a given protein backbone. Since proteins are key players in virtually all biological 
processes, the ability to design proteins is of great practical interest, e.g., to the phar- 
maceutical and biotechnological industry Experimental protein design methods, 
such as directed evolution |3|, have been applied successfully However, since ex- 
perimental methods are time- and money-consuming, computational approaches 
are an attractive alternative. 

Computational protein design is related to the side-chain placement (SCP) prob- 
lem in protein homology modeling. Given the modeled backbone of a protein, the 
amino acid side-chains have to be placed on this backbone in the energetically most 



favorable conformation. Two assumptions are commonly made: (i) side-chains 
adopt only statistically dominant low-energy side-chain conformations, the so-called 
rotamers 1 10 1, and (ii) the energy of a protein is the sum of intrinsic side-chain en- 
ergies and pairwise interaction energies. These assumptions lead to the following 
discrete optimization problem: For each residue position choose a rotamer such 
that the total energy of the protein is minimum. This problem has been shown to be 
NP-hard |20| and inapproximable |5|. 

In protein design the candidate rotamers at each position do not only come 
from a single amino acid but from several potential amino acids, yielding very large 
problem instances. Previous in silico approaches to protein design differ in their 
choice of rotamer library, energy function, and optimization method. Utilization 
of a higher-resolution rotamer library and a more accurate energy function will im- 
prove the results. On the other hand, it will increase computation time and prob- 
lem size. Regarding the optimization methods, computational protein design ap- 
proaches generally employ computationally expensive heuristics such as the Monte 
Carlo method |7, 6, 21 1. Other heuristics, which have been proposed for SCP in pro- 
tein homology modeling, could also be applied (24] [25] [9j [28] [22] . However, Voigt 
et al. 1 23 1 have shown that these inexact algorithms become less accurate with in- 
creasing problem size. Thus, exact methods capable of solving large protein de- 
sign instances are desirable. Several approaches to solving the SCP problem exactly 
have been proposed, including dead end elimination [8][TT][T9| (combined with sys- 
tematic search |17| or residue reduction |26]), integer linear programming [Tlll4l, 
branch-and-bound |4 24| and tree decomposition |27|. While most of these ap- 
proaches work well for homology modeling, they reach their limits when applied to 
protein design. 

In this paper, we propose a novel exact method for SCP that works well even for 
large instance sizes as they appear in protein design. After presenting the combi- 
natorial problem formally in Section [2] we describe our new method in Section [3] 
Our main contribution is a dedicated branch-and-bound algorithm that combines 
tight upper and lower bounds resulting from a novel Lagrangian relaxation approach 
for SCP. In Section [3] we present and discuss our experimental results, in which 
we show that our method outperforms alternative state-of-the-art exact approaches 
and makes it possible to optimally solve large protein design instances routinely. 

2 Combinatorial Problem Formulation and Notation 

We study the following graph- theoretic formulation of the side-chain placement 
problem: 

Problem 1 (SCP) . Given a k - partite graph G — [V,E), V — V\ U . . . U l£ , with node costs 
c v , v e V, and edge costs c uv , uv e determine an assignment a :{l,...,k}—>V with 
a{i) &Vi,\<i<k, such that the cost 
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of the induced graph is minimum. 

Here, each node set Vj corresponds to the candidate rotamers for the residue 
set at position i. Node costs model self energies of rotamers and edge costs model 
interaction energies between pairs of rotamers. A solution is given by selecting for 
each residue position i, 1 < i < k, exactly one rotamer a[i). Clearly, the choice of 
the rotamer determines also the amino acid at this position. 

In the description of our algorithm we will also use a function r : V —> {1, . . . , A;} 
that denotes the residue position of a rotamer v, that is, r{v) — i if and only if v e Vf. 



3 Lagrangian Relaxation Based Branch-and-Bound 

We now present our novel approach to solve the SCP problem to provable optimal- 
ity. Its core is the computation of sharp upper and lower bounds using a novel La- 
grangian relaxation technique within a dedicated branch-and-bound approach. 

3.1 Upper and Lower Bounds by Lagrangian Relaxation 

Our relaxation builds on an integer linear programming (ILP) formulation for SCP 
that has been introduced by Althaus etal. 1 1 1 and extended by Kingsford et al. [14] : 



min^c„x„+ ^ c uv y uv (1) 

veV uveE' 

Vi„ = l 1 < i < k (2) 

veV, 

~^y uv =x v 1 < i < k, for all v e Vj with; e r yV t + (3) 

^y U v<x v 1 < i < k, for all v e \j with; <£^ + (4) 



veV 

S.t. 

veVi 



uv<0 



x v e {0, 1} for all v e V (5) 

y uv <E {0,1} for all u v e E' (6) 

The formulation contains binary variables x v , for nodes eel/, and y u v , for edges 
uv e E, with the interpretation that a variable is 1 if the corresponding node or 
edge is part of the induced subgraph and otherwise. Constraints (2) express that 
exactly one rotamer must be chosen per residue position. Constraints (3) and (4) 
link node and edge variables. When a rotamer v of a residue position j is chosen, 
i.e., x v — 1, exactly one incident edge from each other residue position i ^ j must 
be chosen as well for residue positions i that share positively weighted edges with 
residue position j, i.e., j e jY^~ := {I e {1,..., k] \ 3uv e E, c uv > 0, u e Vi, v e V[}. If 
the two residue positions i and j are linked only by non-positive edges, i.e., j ^ jV^, 
the relaxed constraints (4) apply: zero -weighted edges do not have to be forced to 
be in the solution and the corresponding variables can be removed from the ILP. Let 
E' :— E\{uv & E \ u & Vi , v & Vj , j £Jf^ , c uv — 0} be the set of remaining edges. 
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The distinction between pairs of residues i,j with j e Jf^~ and pairs i,j with 
j <£ J*f^ in constraints (3} and |4) leads to a considerably smaller number of variables 
in practice and to a much better performance. Nevertheless, it is not crucial for the 
understanding of our approach and we thus drop this distinction in the remainder 
of this work and treat all pairs of residues as in constraint (3) for the sake of clarity 
of the presentation, that is, without removing any variables. In our implementation, 
however, we treat constraints (3) and (4) differently. 

While previous work [T][T4] focuses on solving the linear programming (LP) re- 
laxation of (T)-(6), we propose a Lagrangian relaxation approach. In the case of SCP 
this leads to a much more efficient algorithm, because we exploit structural knowl- 
edge of the SCP problem. The idea of Lagrangian relaxation is to relax constraints of 
an intractable problem, e.g., the SCP ILP, such that the relaxed problem can be solved 
efficiently. The relaxed constraints are moved to the objective function, penalized 
by so-called Lagrangian multipliers. An optimal solution of the original problem, 
i.e., an energy- minimum choice of candidate rotamers, is also a solution of the re- 
laxed problem, and every optimal solution of the relaxed problem provides a lower 
bound on the optimal score of the original problem. The Lagrangian multipliers 
are adjusted iteratively such that the lower bound increases gradually. Also, after 
each iteration, we can evaluate the solution of the relaxed problem and thus obtain 
a new upper bound to the SCP problem. During the iterative process, the lowest 
upper and highest lower bound move closer and closer together. If they coincide, a 
provably optimal SCP has been found. Otherwise, we stop the process after a fixed 
number of iterations and use the bounds within the branch-and-bound framework. 

The key idea of our Lagrangian relaxation approach is to define a total order, de- 
noted by <, on the residue positions, to split the constraints that link node and edge 
variables into a left and a right part and then relax the right part of the constraints. 
W.l.o.g. and for ease of notation we assume that the residue positions have already 
been ordered, that is, residue position i denotes the ith residue position according 
to <. 

First, we rewrite constraints (3) as left and right parts, that is, (3) becomes 

X v 1 < i < k - 1, for all ve \j with; > i (7) 

x v 2<i< k,for alive Vj with j <i (8) 

We dualize constraints (8) for all non-neighboring residues, i.e., constraints for which 
i > j + 1, with Lagrangian multipliers X 1 . To simplify notation, we further intro- 
duce A' := for neighboring residues, i.e., for which r(v) + 1 = i holds. For fixed 
Lagrangian multipliers we obtain the following relaxation, which we denote by 
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s.t. ^x v = l l<i<k (10) 

veV, 

) y uv =x v 1 < i < k — 1, for all v e Vj with./ > i (11) 

ueV, 

^yuv=x v 2</<fcforallyeM-i (12) 

ueV, 

x v ,y uv e{0,l} (13) 

Note that a distinction between pairs of residues according to constraints |3j and (4) 
requires the Lagrangian multipliers A.' associated with constraints {4} to be restricted 
in sign in order to guarantee that an optimal solution to {LRx) yields a lower bound 
on the optimal score of the SCP problem. 

An integral variable assignment that satisfies constraints (10), (12), and con- 
straints (TT) for neighboring residues, i.e., for j = i : + 1, encodes a path p in the 
corresponding A;-partite graph from a node in VI to a node in VJt that traverses ex- 
clusively edges between neighboring residues. The remaining constraints of (TT) 
involve y- variables that do not appear in any other constraint and can thus be cho- 
sen independently of each other. In other words, we can determine the best possible 
contribution of a vertex v to the overall objective value, under the assumption that v 
lies on path p, by simply picking for every residue i < r(v)— 1 the edge of minimum 
weight between v and a node in Vf. More formally, we define the profit 5 of a node 
v as 

r(v)-2 

5{v) = {c v + £ ^ min(c UI ,-Af') , 

i>r(v)+l i=l 

where the first term in brackets denotes the coefficient of variable x v in the objective 
function. Then the score of a feasible solution to {LR{) that induces a path p — 
{v\,V2,...,Vk) with i>t e V, in graph G is 

k jt-1 
i=l i=l 

Let graph G' be derived from G by removing all edges between non-neighboring 
residues, i.e., edges u'v' with \ r(u') — r{v')\ > 1, and by defining the weights of the 
remaining edges uv as c uv + 5{v). Then, an optimal solution to (Li?^) corresponds 
to a shortest path in G' from a node in Vi to a node in Vt, see Figure[T] 

Theorem 1. An optimal solution to {LR^} can be computed in time 6{\ V\ 2 ). 

Proof. The profits of all nodes can clearly be computed in time ^(l^p). Graph G' 
is acyclic and thus a shortest path can be computed in time linear in the number of 
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Figure 1 : The structure of a feasible solution to {LRx). The polygon drawn in solid line denotes 
the corresponding path p = ( Ui , v 2 , v 3 , v 4 ). Every node on path p has exactly one incident edge 
to every residue left of it, except to its direct neighbor, depicted by the dashed lines. 

edges in G', i.e., G(^ i= l \V t \-\ V i+ i\). Note that a topological sorting of the vertices is 
implicitly given by the fc-partition. □ 

We apply a standard subgradient optimization technique [12] to find those La- 
grangian multipliers Aj, that yield the largest lower bound to our relaxation. This 
iterative adaption of the Lagrangian multipliers only requires the profits of a small 
fraction of the vertices to be recomputed from scratch in each iteration. In practice, 
the shortest path computation for given profits by a simple dynamic programming 
scheme dominates the overall running time needed to resolve the Lagrangian relax- 
ation {LRx) for modified multipliers X l y . In other words, the running time will be 
linear in the number of edges in G' rather than G. 

In order to improve the practical performance of our approach we sort the residues 
by increasing number of rotamers. This ordering results in a minimum total number 
of dualized constraints. 

3.2 Branch-and-Bound 

We embed our Lagrangian bounding scheme into a branch-and-bound framework 
to obtain an energy-minimum rotamer assignment. The general idea is to divide 
the overall SCP problem into easier subproblems by fixing rotamers at individual 
residue positions and to solve the resulting problems recursively. To avoid a com- 
plete enumeration of all possible rotamer assignments, we employ our Lagrangian 
bounds to prune large parts of the enumeration tree. In particular, let Sk be a sub- 
problem in which certain residue positions have been assigned a rotamer, and let 
x be an arbitrary solution to the original SCP problem. If the lower bound z k on 
the minimal energy assignment for subproblem Sk is larger than z(x), i.e., the total 
energy of rotamer assignment x, then no optimal solution to the original problem 
can be obtained from Sk and we can prune the subtree rooted at Sk from the search 
space. 

Branching scheme. Starting from an SCP problem instance S, a straightforward 
branching rule is to impose the constraint Xij — 1 in the left child node of the search 
tree, and Xij — in the right, i.e., fixing and forbidding a rotamer j at residue position 
i, respectively. However, since every rotamer j is contained in a constraint (2) for a 
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residue position i, this leads to an unbalanced search tree, because the right child 
node leaves k — 1 possible rotamers for residue position i, whereas the left child 
leaves only one possibility. While partitioning the set of rotamers of a given residue 
position into two roughly equally sized sets avoids this imbalance, we experienced, 
however, a significantly smaller number of nodes in the tree with the following, al- 
ternative branching scheme. Instead of creating two subproblems, we create one for 
each ro tamer of a selected residue position i, by fixing rotamer jk in subproblem k. 
Only for very large design instances, we first partitioned the sets of possible rotamers 
that were larger than some threshold p, into two smaller sets. The effectiveness of 
this scheme is mainly based on two properties. First, when fixing a rotamer j for a 
residue position i, we reduce the problem instance by incorporating the interaction 
energy between rotamer j and any other rotamer j' of all residue positions V ^ i 
into the self energy of j'. In contrast, in our relaxation only the profits 5 of rotamers 
of residue positions V > i would take into account a further subdivision of the set of 
rotamers of residue position i. The rotamer assignment to residue positions V < i 
would still rely on a correct choice of the incoming edges for the remaining rotamers 
of residue position i, which could only be accomplished by iteratively adapting the 
Lagrangian penalties. Second, a large enough set of child nodes will give our depth- 
first search traversal of the branch-and-bound tree the freedom to pick a promising 
node first. 

Choosing a constraint. The question remains which position constraint j2) to choose 
for a branching step. We adopt the idea of strong branching 1 2 1 . The rough idea is to 
estimate the progress, i.e., increase in lower bound, for the residue positions before 
actually branching on one of them. This is done by successively fixing each rotamer 
of a given residue position and solving the resulting Lagrangian subproblem. Based 
on the progress of the single rotamers, we compute an overall score of the residue 
position, see below, and pick the one with the highest score as the next residue po- 
sition to branch on. Since the computation time per node of this procedure would 
be enormous, we try to estimate the locally best residue position by simplifying this 
search in three different aspects. 

• First, we restrict the evaluation to the most promising residue positions. More 
precisely, we order the residue positions in increasing order of their maxi- 
mum (primal) fractional value of its rotamers. We recover the primal solu- 
tions by taking the convex combinations of the last k solutions x* produced 
in the course of the subgradient optimization. Then, at a node of depth / of 
the branch-and-bound tree, we consider the first percent of the sorted 
residue positions. Note that the concept of a residue position with minimal 
maximal fractional value of its rotamer variables can be considered as a gen- 
eralization of the most infeasible branching rule for binary variables, to con- 
straints of the form (2) . 

• Second, to estimate the increase of the objective function when fixing a ro- 
tamer, only a few subgradient iterations are performed, along with an aggres- 
sive multiplier adjustment. 
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• Finally, we choose our scoring function of the residue positions in such a way, 
that they can be computed quickly while still giving a good estimate on the 
overall progress in the dual (lower) bound. Let i2 be the subproblem corre- 
sponding to the current node of the tree and let problem Sl\ be obtained from 
i2 by fixing rotamer j in residue r;. Then the score of a residue position r, is 
given by 

£(ri) = minAp 

where A t = .z(i2; )— z(i2). To determine the score of a given residue position i, 
we test the rotamers in decreasing order of their fractional values. The goal is 
to evaluate rotamers j with small progress A; first, since the subgradient opti- 
mization for the remaining rotamers can be aborted as soon as their increase 
in objective function exceeds the smallest progress seen so far. Also notice that 
sorting the residue positions as described above is beneficial for the computa- 
tion of the residue scores. Whenever we encounter a rotamer with a progress 
A; that is smaller than the smallest progress determined for a previous residue 
position V / i, we do not have to consider the remaining rotamers of V , since 
we are interested in the residue position with maximal score. 

Choosing a node. A primal feasible solution that gives a good upper bound on 
the minimum total energy is necessary to prune the enumeration tree significantly. 
Therefore we follow a depth-first search (DFS) strategy to descend in the branch- 
and-bound tree as quickly as possible, increasing the chances of finding a new and 
hopefully better feasible solution. Furthermore, the Lagrangian subproblems cor- 
responding to a node and to one of its immediate descendants differ only in one 
residue position. Therefore, a subproblem can be resolved faster when starting from 
the multiplier vector A determined in the immediate parent node. On the downside, 
once being in a wrong branch, one may spend a long time in this subtree before get- 
ting back on a path leading to an improved solution. We thus combine the advan- 
tages of DFS and a best-node first strategy. We fix the rotamers of a given residue 
position in increasing order of their dual (lower) bounds. Following this approach 
led to a considerably smaller number of nodes evaluated in the tree, and we were 
able to find the optimal solution much faster in most of the cases. 



4 Experimental Results 

We have implemented our combinatorial algorithm for finding an optimal solution 
to the Lagrangian relaxation {LR^) in C++ using the LEDA and BALL libraries 1 18 
ITSl . We iteratively improve the obtained lower bounds on the optimal solution of 
the SCP problem by applying a standard subgradient approach. In each iteration, 
we derive from the Lagrangian solution a feasible solution to the original problem 
and thus an upper bound on the optimal score by evaluating the subgraph induced 



by the nodes lying on the shortest path from V\ to l£, see Section 3.1 We exploit the 
upper and lower bounds in a branch-and-bound manner to prune large parts of the 
search space and to derive a provably optimal solution to the SCP problem. 
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In order to determine an initial upper bound for the branch-and-bound frame- 
work, we employ a simple local search procedure: Given an initial configuration 
in which each residue position is assigned the rotamer with the lowest self energy, 
residue positions are selected randomly and optimized, i.e., the respective position 
is assigned the rotamer yielding the best energy within the current conformation. 
This minimization proceeds until the energy could not be improved several times in 
a row or a maximum of 100 iterations is reached. 

The only state-of-the-art exact method for SCP that can cope with protein design 
instances is the ILP based method proposed by Kingsford et al. [14]. Available soft- 
ware packages for DEE or treewidth based approaches such as R3 |26 or TreePack 
1 27 1 do not allow several candidate amino acids at each position and are thus not ap- 
plicable to protein design instances. Furthermore, our experiments show that even 
small protein design instances already have treewidths of 10 to 20 as compared to 3 
to 4 for most homology modeling instances \27\. Since the complexity of the TreeP- 
ack algorithm grows exponentially in the treewidth, a reasonable performance on 
protein design instances is not to be expected. For DEE-based methods, a similar ar- 
gument holds because reduced protein design instances are still too large to be pro- 
cessed in reasonable time by residue unification or other enumeration techniques. 
We therefore compare our Lagrangian based approach only to an implementation 
that solves the ILP proposed by Kingsford et al. | 14 1 using CPLEX 12.2^with Concert 
Technology. 

In our experiments, we used two different benchmark sets. The first set con- 
sists of protein design energy files provided by Kingsford et al. [14]. It comprises 
25 proteins with 1 1 to 124 flexible residue positions. Surface residues are fixed. At 
each core position up to six different amino acids are allowed. The employed energy 
function comprises statistical potentials and van der Waals interactions. We omit 
the experimental results on the simpler homology modeling instances, since almost 
all of these instances can be solved in a fraction of a second by both our Lagrangian 
relaxation approach and the CPLEX based method. The second set of protein de- 
sign instances was taken from Yanover et al. [28] . This set comprises 97 proteins 
with 40 to 180 amino acids. All residue positions are flexible and at each position all 
20 amino acids are allowed yielding very large problem instances. Here, the more 
realistic Rosetta energy function 1 16 was used to determine self and interaction en- 
ergies. 

In a preprocessing phase, we apply established rules [11 26 1 to decrease the size 
of the problem instances while preserving optimality properties. Tables [T] and [2] 
show the running times of our Lagrangian relaxation branch-and-bound approach 
and the CPLEX based method using default settings on the resulting instances on a 
compute cluster with two 2.26 GHz Intel Quad Core processors with 24 GB of RAM 
on each node, running 64 bit Linux. We applied a time limit of 12 hours and a mem- 
ory limit of 16 GB. Computations exceeding one of these limits were aborted. 

The first three columns of the table give the characteristics of the instances, 
i.e., their PDB identifier, the number of residues and the total number of rotamers. 
The following two columns give characteristics of the branch-and-bound proce- 

'http : //www . cplex . com 
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Instance Lagrangian B&B CPLEX 



Name 


#res 


#rot 


N 


H 


time/s 


time/s 


S 


laac 


105 


1523 


2 


1 


1.73 


3.40 


2.0 


laho 


64 


981 


1 





0.01 


0.02 


2.0 


lb9o 


123 


2056 


3 


1 


2.09 


2.56 


1.2 


lc5e 


95 


1108 


1 





0.12 


0.25 


2.1 


lc9o 


66 


1130 


2 


1 


0.33 


1.96 


5.9 


lcc7 


72 


1396 


1 





0.28 


0.59 


2.1 


lcex 


197 


2556 


9 


2 


13.37 


33.25 


2.5 


lcku 


85 


1093 


1 





0.03 


0.09 


3.0 


lctj 


89 


1021 


1 





0.07 


0.27 


3.9 


lcz9 


139 


2332 


1 





3.7 


18.10 


4.9 


lczp 


98 


1170 


1 





0.54 


4.32 


8.0 


ld4t 


104 


1636 


1 





0.37 


2.36 


6.4 


ligd 


61 


926 


1 





0.01 


0.02 


2.0 


lmfm 


153 


2134 


25 


5 


21.89 


145.63 


6.7 


lplc 


99 


1156 


2 


1 


1.50 


6.08 


4.1 


lqj4 


256 


4080 


313 


10 


8,424.56 


31,636.40 


3.8 


lqq4 


198 


2045 


16 


4 


32.56 


38.89 


1.2 


lqtn 


152 


2516 


1 





1.50 


3.22 


2.1 


lqu9 


126 


1817 


2 


1 


0.31 


0.66 


2.1 


lrcf 


169 


2396 


2 


1 


4.76 


12.85 


2.7 


lvfy 


67 


939 


1 





0.01 


0.01 


1.0 


2pth 


193 


3077 


66 


6 


322.28 


518.51 


1.6 


31zt 


129 


2074 


7 


2 


3.20 


10.64 


3.3 


5p21 


166 


2874 


52 


4 


106.09 


115.01 


1.1 


7rsa 


124 


1958 


1 





0.78 


3.31 


4.2 



Table 1: Running times of our Lagrangian relaxation branch-and-bound approach and the 
CPLEX based method on the design instances from 1141 . We further give the number of 
residues (#res) and the total number of rotamers (#rot) of the instance, the number of nodes 
(N) and height (H) of the branch-and-bound tree as well as the speedup S (ratio of running 
times). 

dure: Columns N and H give the total number of evaluated nodes and the height of 
the branch-and-bound tree, respectively. The remaining columns give the running 
times in seconds of our Lagrangian based approach and the CPLEX based method 
as well as the ratio of running times S. Note that we include the time spent in the 
local search heuristic for the Lagrangian branch-and-bound approach. 

On the first dataset, our Lagrangian based approach outperforms the state-of- 
the-art CPLEX based method on all 25 instances. The small number of nodes eval- 
uated in the course of the branch-and-bound procedure indicates sharp lower and 
upper bounds derived from the Lagrangian solutions. On the more challenging sec- 
ond dataset, our method could solve 52 of the 97 instances within 12 h, whereas the 
CPLEX based method could only finish 12 instances within the time and memory 
limits. 
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Instance Lagrangian B&B CPLEX 



Name 


#res 


#rot 


N 


H 


time/s 


time/s 


S 


lbrf 


44 


3524 


9 


4 


293.97 


469.87 


1.6 


lbx7 


25 


1048 


1 





0.54 


5.77 


10.7 


ld3b 


66 


5732 


1 





530.37 


9,577.68 


18.1 


len2 


59 


2689 


1 





19.41 


39.94 


2.1 


leze 

o 


58 


1653 


2 


1 


185.11 


441.23 


2.4 


lg6x 


51 


3190 


1 





23.96 


160.64 


6.7 


lgcq 


65 


5442 


4 


2 


903.82 


5,270.08 


9.8 


li07 


52 


3186 


4 


1 


187.45 


166.20 


0.9 


lkth 


49 


3330 


18 


4 


798.57 


642.42 


0.8 


lrb9 


43 


3307 


7 


2 


127.93 


9,535.72 


74.5 


lsem 


54 


4348 


192 


8 


5,020.55 


6,470.37 


1.3 


lvfy 


58 


3951 


16 


2 


2,540.86 


t 


n/a 


4rxn 


45 


3636 


1 





220.33 


3,034.57 


13.8 


la8o 


62 


4510 


6 


2 


1,418.71 


t 


n/a 


lb67 


66 


5543 


27 


4 


3,822.09 


t 


n/a 


lbbz 


52 


3935 


7 


2 


1,329.26 


t 


n/a 


lbf4 


60 


5289 


12 


3 


1,875.32 


t 


n/a 


lc75 


63 


4323 


25 


2 


7,175.69 


t 


n/a 


lcc8 


69 


6515 


26 


2 


16,508.10 


t 


n/a 


ld3b 


66 


5732 


1 





530.37 


t 


n/a 


lfr3 


61 


5100 


22 


4 


6,997.76 


t 


n/a 


lgut 


62 


4945 


22 


2 


7,745.17 


t 


n/a 


lhg7 


65 


5047 


5 


2 


987.51 


t 


n/a 


li27 


69 


5934 


39 


6 


4,070.20 


t 


n/a 


ligd 


60 


5207 


18 


4 


3,163.14 


t 


n/a 


ligq 


53 


4582 


18 


5 


4,294.16 


t 


n/a 


liqz 


75 


5412 


15 


2 


2,137.58 


t 


n/a 


lj'75 


55 


4861 


14 


4 


5,704.83 


t 


n/a 


ljo8 


54 


4680 


41 


4 


1,830.23 


t 


n/a 


lkql 


58 


5244 


7 


1 


3,990.65 


t 


n/a 


1191 


69 


5518 


4 


2 


1,514.50 


t 


n/a 


lldd 


71 


6383 


8 


1 


4,582.84 


t 


n/a 


lljo 


69 


6428 


14 


3 


5,624.10 


t 


n/a 


lmhn 


53 


4454 


3 


2 


570.15 


t 


n/a 


lnkd 


56 


4148 


9 


4 


1,119.56 


t 


n/a 


loai 


56 


4330 


73 


3 


20,021.10 


t 


n/a 


lplc 


92 


7955 


34 


4 


24,308.50 


t 


n/a 


lpwt 


58 


4876 


2 


1 


886.94 


t 


n/a 


lr69 


60 


4926 


49 


2 


27,862.50 


t 


n/a 


lwap 


65 


5551 


14 


3 


5,267.68 


t 


n/a 


2igd 


59 


5262 


13 


3 


6,332.26 


t 


n/a 


lc4q 


65 


5598 


843 


10 


30,789 


t 


n/a 


lc9o 


60 


5305 


97 


8 


11,627.20 


t 


n/a 


let] 


84 


6232 


265 


10 


7,083.65 


t 


n/a 


ldj'7 


69 


5571 


22 


4 


12,581.90 


t 


n/a 


leOb 


58 


4715 


147 


6 


30,840.10 


t 


n/a 


lerv 


101 


9150 


76 


8 


31,539.70 


t 


n/a 


lfk5 


81 


5714 


28 


5 


14,625.30 


t 


n/a 


lg2b 


59 


4926 


129 


8 


6,880.45 


t 


n/a 


lmgq 


71 


6250 


4 


1 


2,613.06 


t 


n/a 


lvie 


56 


4803 


3 


111 


866.54 


t 


n/a 



Table 2: Running times of our Lagrangian relaxation branch-and-bound approach and the 
CPLEX based method on the design instances from |28|. Instances which cannot be solved 
by both approaches within a time limit of 12 hours are omitted. The t sign indicates that a 
computation exceeded either the time limit (12 h) or the memory limit (16 GB). 



5 Conclusions and Outlook 



We have constructed a Lagrangian relaxation of the Kingsford ILP formulation of 
the SCP problem that allowed us to obtain strong bounds by solving a modified 
shortest path problem on the underlying k -partite graph. By utilizing these bounds 
within a branch-and-bound framework we achieved running times that outperform 
a state-of-the-art exact method that uses the professional mathematical program- 
ming solver CPLEX. Our implementation of the Lagrangian branch-and-bound ap- 
proach as well as the data sets used in this paper are freely available as the package 
scp of the planet lisa software library 1151 . 

Future work on exact side-chain placement should explore possible connections 
to the recently introduced method by Sontag et al. |22|, which is based on belief 
propagation. This heuristic algorithm is currently the best non-exact method and 
finds optimal solutions astonishingly often. In our opinion, underlying ideas from 
the area of belief propagation may be useful also in a truly exact method. 

The mathematical model of the SCP problem as studied in this work appears in a 
wide range of applications including image understanding, error correcting codes, 
and frequency assignment in telecommunications. We believe that our approach 
can be applied successfully in these areas, too. 
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